Talk by Joshua Vogelstein of Johns Hopkins University. Given to the Redwood Center for Theoretical Neuroscience at UC Berkeley.
Abstract Determining whether certain properties are related to other properties is fundamental to scientific discovery. As data collection rates accelerate, it is becoming increasingly difficult yet ever more important to determine whether one property of data (e.g., cloud density) is related to another (e.g., grass wetness). Only if two properties are related are further investigations into the geometry of the relationship warranted. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes in real data scenarios, and do not provide insight into the geometry underlying the structure of the relationship. We juxtapose hypothesis testing, manifold learning, and harmonic analysis to obtain Multiscale Generalized Correlation (MGC). Our key insight is that one can adaptively restrict the analysis to the "jointly local" observations - that is, one can estimate the scale with the most informative neighbors for determining the existence and geometry of a relationship. We prove that to achieve a given true positive rate, MGC typically requires far fewer samples than existing methods for all investigated dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship. We used MGC to detect the presence and reveal the geometry of the relationships between mental and brain properties, to perform a proteomics screening, and to develop an imaging biomarker for disease, while avoiding the false positive inflation problems that have plagued conventional parametric approaches. Our open source implementation of MGC is easy to use, parameter-free, and applicable to previously vexing statistical questions that are ubiquitous in science, government, finance, and other disciplines.